-
-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for better handling with large arrays #402
Conversation
A couple of things:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really solid work! We can include this in Fastify v4.
for discussion only, I try to change this two const in const LARGE_ARRAY_SIZE = Math.pow(2,14)
const MULTI_ARRAY_LENGHT = Math.pow(2,10) and the answer is : FJS creation x 8,059 ops/sec ±17.29% (88 runs sampled)
CJS creation x 143,612 ops/sec ±0.41% (92 runs sampled)
AJV Serialize creation x 64,358,288 ops/sec ±0.59% (96 runs sampled)
JSON.stringify array x 5,771 ops/sec ±0.74% (99 runs sampled)
fast-json-stringify array default x 7,984 ops/sec ±1.29% (97 runs sampled)
fast-json-stringify array json-stringify x 7,263 ops/sec ±0.32% (98 runs sampled)
fast-json-stringify array array-join x 6,999 ops/sec ±0.88% (96 runs sampled)
compile-json-stringify array x 7,810 ops/sec ±0.38% (97 runs sampled)
AJV Serialize array x 7,852 ops/sec ±0.17% (97 runs sampled)
JSON.stringify large array x 365 ops/sec ±0.14% (93 runs sampled)
fast-json-stringify large array default x 357 ops/sec ±0.18% (71 runs sampled)
fast-json-stringify large array json-stringify x 326 ops/sec ±0.12% (74 runs sampled)
fast-json-stringify large array array-join x 316 ops/sec ±0.57% (72 runs sampled)
compile-json-stringify large array x 310 ops/sec ±1.16% (65 runs sampled)
AJV Serialize large array x 402 ops/sec ±0.09% (94 runs sampled)
JSON.stringify long string x 11,856 ops/sec ±0.37% (96 runs sampled)
fast-json-stringify long string x 11,934 ops/sec ±0.18% (98 runs sampled)
compile-json-stringify long string x 11,918 ops/sec ±0.24% (95 runs sampled)
AJV Serialize long string x 22,056 ops/sec ±0.15% (98 runs sampled)
JSON.stringify short string x 12,208,053 ops/sec ±0.13% (95 runs sampled)
fast-json-stringify short string x 38,396,194 ops/sec ±3.45% (88 runs sampled)
compile-json-stringify short string x 30,411,984 ops/sec ±1.07% (96 runs sampled)
AJV Serialize short string x 37,409,999 ops/sec ±0.54% (93 runs sampled)
JSON.stringify obj x 3,057,954 ops/sec ±1.76% (93 runs sampled)
fast-json-stringify obj x 12,644,309 ops/sec ±0.28% (96 runs sampled)
compile-json-stringify obj x 19,205,007 ops/sec ±0.69% (97 runs sampled)
AJV Serialize obj x 10,518,552 ops/sec ±1.19% (92 runs sampled)
JSON stringify date x 1,135,091 ops/sec ±0.41% (98 runs sampled)
fast-json-stringify date format x 1,995,636 ops/sec ±0.27% (90 runs sampled)
compile-json-stringify date format x 1,125,693 ops/sec ±0.56% (97 runs sampled) you can see: fast-json-stringify large array default x 357 ops/sec ±0.18% (71 runs sampled)
fast-json-stringify large array json-stringify x 326 ops/sec ±0.12% (74 runs sampled)
fast-json-stringify large array array-join x 316 ops/sec ±0.57% (72 runs sampled) The default is faster, But I don't know why…… update: test in nodejs v16.14.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for all if those settings?
@xtx1130 I believe it's due to the choice of |
It was a (somewhat messy by the end) script I've made. I'll tidy it up and add a gist to it to the PR's description. I don't think it should be added to the code base, given we already have a benchmark suite set up.
I see your point. Indeed that'd help spot differences in the smaller ranges but, at the same time, given the overall graph scale, it'd make differences harder to spot in the larger ranges. Visually we'd see small changes but those changes would be much more meaningful than in the smaller ranges. And, since this is really only an issue for very large arrays, I compromised on displaying actual time differences instead of a log-scaled version of it so we could have a more direct and easier to read result. For instance, the data displayed in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@simoneb added the scripts used to the PR description (https://gist.github.com/wilkmaia/c2a43a145fcd0f9dda06710933c26850) under |
Case 1: const LARGE_ARRAY_SIZE = Math.pow(2,15) result 1: fast-json-stringify large array default x 168 ops/sec ±1.81% (85 runs sampled)
fast-json-stringify large array json-stringify x 180 ops/sec ±0.58% (84 runs sampled)
fast-json-stringify large array array-join x 78.31 ops/sec ±2.93% (59 runs sampled) result 2: fast-json-stringify large array default x 176 ops/sec ±0.28% (90 runs sampled)
fast-json-stringify large array json-stringify x 179 ops/sec ±0.81% (83 runs sampled)
fast-json-stringify large array array-join x 77.49 ops/sec ±3.38% (60 runs sampled) Case 2: const LARGE_ARRAY_SIZE = 20000 result 1: fast-json-stringify large array default x 124 ops/sec ±0.55% (78 runs sampled)
fast-json-stringify large array json-stringify x 298 ops/sec ±0.15% (92 runs sampled)
fast-json-stringify large array array-join x 137 ops/sec ±2.38% (58 runs sampled) result 2: fast-json-stringify large array default x 122 ops/sec ±0.90% (51 runs sampled)
fast-json-stringify large array json-stringify x 296 ops/sec ±0.34% (94 runs sampled)
fast-json-stringify large array array-join x 140 ops/sec ±1.94% (59 runs sampled) There is a very big difference between In default mode, |
@xtx1130 that's indeed an odd behavior. I'm not, though, able to reproduce it. I've tested on a different system (node.js v16.14.2) and got the following results (mind LARGE_ARRAY_SIZE = Math.pow(2, 15)
LARGE_ARRAY_SIZE = 2e4
As expected, the higher the value of Is it possible you had fluctuating CPU usage on your system during those tests? Maybe some process in the background or things like video/audio processing were running alongside the tests. That could explain the discrepancy, at some level. Besides that, with only the info we have, I don't have many ideas as of why you're experiencing these different results. If you have access to a different system, you could maybe check if you can reproduce that behavior there as well. Finally, I'm assuming you're running the benchmark ( |
I get the same result as @xtx1130, the I am not sure if it is affected by the environment. Here is the system information.
|
@climba03003 @xtx1130 when you have some time can you please re-run the benchmark using |
@wilkmaia Here is the multiple benchmark I run to figure out which number is actually dropping. You may check the last one To get everyone in sync the switch (largeArrayMechanism) {
case 'default':
break
case 'json-stringify':
code += `
return JSON.stringify(obj)`
break
case 'array-join':
code += `
return \`[\${obj.map(${result.mapFnName}).join(',')}]\``
break
default:
throw new Error(`Unsupported large array mechanism ${largeArrayMechanism}`)
} Math.pow(2, 18) -
|
Thanks for the detailed info, @climba03003. Apparently we're seeing some performance difference based on the host machine. That's my best guess with the info we have so far. I don't think this changes much on what this PR is aiming at, anyway, specially with the latest changes, which included an option for the user to set how large a large array is. For instance, for your specific scenario you might want to set that to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for @climba03003 's test cases, I'm +-1 of |
If we're happy with the result, I think this can now be merged |
Intro
This PR is a tentative approach at improving the behavior described in fastify/fastify#3498. In short, for very large arrays (approx over 20k elements) the overhead introduced by
fast-json-stringify
becomes relevant and makes the stringifying process longer.Closes fastify/fastify#3498.
Analysis
Some simple benchmarks were run both in order to better understand the issue and to understand the impact of the various approaches attempted at improving it.
Methodology
A variable-length array with similar spec as to the large one available in the linked issue was stringified using two approaches:
JSON.stringify
as baselinefast-json-stringify
Data on array length and time taken for the process to complete was collected and plotted over several different graphs, displayed below.
JSON.stringify
's time to complete is plotted in orange andfast-json-stringify
's in blue. Time is displayed in nanoseconds. Data for each array length was averaged over 30 different measurements.The benchmark script used (as well as helpers and dependencies) is available at https://gist.github.com/wilkmaia/c2a43a145fcd0f9dda06710933c26850.
Results
Baseline Data
Baseline data was initially collected for future comparison. It's shown that the time the two functions take to work over the data starts to get significantly discrepant with arrays of length somewhere between 16k and 32k.
Closer Look
Looking at data over a smaller interval (length of
5000
to32000
) one can see that the discrepancy starts getting more significant for arrays with about 20k items or more. In the following sections we'll consider large array an array with20000
or more elements.Approach 1
Array.join
instead of string concatenation for large arrays.This approach compromises performance to maintain high utility. Performance gain for the largest array tested was of about
33%
and there's no loss in the lib's feature set. The performance gain comes from the fact that, for large arrays, currently,Array.join
performs better than concatenating strings in v8 [1]. Browser performance apparently has been varying over the last years.Approach 2
Array.join
instead of string concatenation for all arrays.This was the worst approach of all, performance-wise. With it the worst case scenario ran only about
13%
faster. That was, though, expected, given performance forArray.join
is significantly worse for smaller arrays. There's no point in following this approach, given it gives no benefit over (Approach 1)[approach-1].Approach 3
JSON.stringify
all arrays.This approach focuses on maximizing performance to the detriment of utility. With that the lib loses spec validation for arrays and its elements but the stringify process completes quickly. Performance gain was about
77%
for the worst case scenario, resulting in overall performance extremely similar toJSON.stringify
for large arrays and quicker for small ones.Approach 4
JSON.stringify
for large arrays.This approach compromises a bit of the performance gained from Approach 3 in order to provide more utility to the lib's users. Only large arrays are submitted to
JSON.stringify
. All others would still benefit from spec validation. Performance gain for this approach was of about76%
for the worst case scenario.Conclusion
In the image above there's a compiled summary of the baselines plus the
fast-json-stringify
's updated approaches results, disregarding the smallest datasets (less than2048
elements).Judging by that I believe the best solution would be to either implement Approach 1 or Approach 4 or, the proposed solution in this PR, to add support for the user to select whether they want the original implementation or one of the two proposed variants.
References
[1]: Simple benchmark for string concatenation vs
Array.join
Project Benchmark
Previous approach
Output for
npm run benchmark
before changes:New approach
Output for
npm run benchmark
after changes:Checklist
npm run test
andnpm run benchmark
and the Code of conduct